Impact of Adult Health-Related Practices on Juvenile Health Outcomes
Introduction
The health behaviors of adults in a community can have a profound impact on their children’s health outcomes. Juvenile quality and length of life can be impacted by the behaviors of the adults around them, such as smoking, drinking, poor diet, and physical inactivity. More insight into the relationship between these variables can be used to identify intergenerational health impacts to provide early intervention and prevention of juvenile outcomes, as well as inform new policy and education. In this report, we analyze the impact of a variety of adult health behaviors on child mortality rate and low birthweight on a county level.
We decided to use low birthweight as a measure of juvenile outcomes due to the risk of future health complications and death. According to the original data source, “infants born with low birthweight have approximately 20 times greater chance of dying than those with normal birth weight” (County Health Rankings 2024). Moreover, infants who survive low birthweight at birth “may face adverse health outcomes such as decreased growth, lower IQ, impaired language development, and chronic conditions (e.g., obesity, diabetes, cardiovascular disease) during adulthood” (County Health Rankings 2024). We also chose to use the child mortality rate variable as a juvenile outcome- while childhood mortality is somewhat rare, mortality rates illustrate the most severe consequence of negative health behaviors and can point to the most significant behaviors that need to be addressed.
The Data
The data utilized is from the County Health Rankings & Roadmaps sourced by the Population Health Institute at the University of Washington. This dataset includes health and socioeconomic data at county level across the United States from 2024. Our key juvenile health outcomes, child mortality rate and low birthweight, are measured as number of deaths among residents under age 20 per 100,000 population and percentage of live births less than 2,500 grams, respectively. For this report we focused primarily on the following adult health behaviors and environments.
- Insufficient sleep: percentage of adults who report fewer than 7 hours of sleep on average
- Excessive drinking: percentage of adults reporting binge or heavy drinking
- Adult smoking: percentage of adults who are current smokers
- Food Insecurity: percentage of adults who lack adequate access to food
- Uninsured: percentage of adult population under age 65 without health insurance
- Sexually Transmitted Infections: number of newly diagnosed chlamydia cases per 100,000 population
- Physical Inactivity: percentage of adults reporting no leisure-time physical activity
- Adult Obesity: percentage of the adult population that reports a body mass index greater than or equal to 30 kg/m2
Exploratory Data Analysis & Cleaning
We performed some initial pre-processing and cleaning of the dataset before investigating the variables. The following scatterplots demonstrate the child mortality rate relationship between adult obesity, adult smoking, and excessive drinking. Excessive drinking seems to contribute to the least amount of child mortality. However, adult smoking and adult obesity are able to show an increase in percentage as it relates to an increase in child mortality. From this we can predict that adult obesity and adult smoking has the highest correlation towards child mortality. Moreover, adult smoking and child mortality have the highest correlation coefficient which is 0.58.
The following scatterplots are a glimpse at the relationship between low birthweight and some adult health behaviors: adult smoking, insufficient sleep, adult obesity, and physical inactivity. All four scatterplots demonstrate a strong positive correlation. Of the scatterplots shown, insufficient sleep has the strongest relationship with low birthweight.
The below graph illustrates the percentage of low birthweight births by state, and is colored by the percentage of the county population reporting insufficient sleep. The red points represent counties with over 75% of the population reporting insufficient sleep. Generally, counties that have a higher percentage of low birthweight births also have a higher percentage of insufficient sleep across the population. The red points are primarily to the right side of the graph, indicating an association between low birthweight and insufficient sleep.
Methods and Results
To predict low birthweight in counties across the United States, we evaluated 3 models: lasso regression, ridge regression, and linear regression. Lasso regression and ridge regression were chosen to determine the effect of regularization and variable selection on modeling by introducing a shrinkage penalty. The variables considered for these include the following: adult smoking, adult obesity, food insecurity, excessive drinking, physical inactivity, insufficient sleep, sexually transmitted infections, and the percentage of uninsured. In all cases, we filtered for observations that had complete data. The data was also standardized for lasso and ridge regression with glmnet in R. For the lasso and ridge regression models cross validation was used to select the value of 𝜆 with cv.glmnet() which uses 10-folds by default; alpha values of 0 and 1 were set respectively. For example, the plot below depicts how the coefficient estimates respond to different values of lambda with lasso regression and ridge regression. The solid line is the smallest 𝜆 value that gives the minimum mean cross-validated error. The dashed line is the largest value that 𝜆 can take while still falling within the one standard error interval of the minimum cross-validated error.
# A tibble: 3,196 × 770
`State FIPS Code` `County FIPS Code` `5-digit FIPS Code` `State Abbreviation`
<chr> <chr> <chr> <chr>
1 statecode countycode fipscode state
2 00 000 00000 US
3 01 000 01000 AL
4 01 001 01001 AL
5 01 003 01003 AL
6 01 005 01005 AL
7 01 007 01007 AL
8 01 009 01009 AL
9 01 011 01011 AL
10 01 013 01013 AL
# ℹ 3,186 more rows
# ℹ 766 more variables: Name <chr>, `Release Year` <chr>,
# `County Clustered (Yes=1/No=0)` <chr>, `Premature Death raw value` <chr>,
# `Premature Death numerator` <chr>, `Premature Death denominator` <chr>,
# `Premature Death CI low` <chr>, `Premature Death CI high` <chr>,
# `Premature Death flag (0 = No Flag/1=Unreliable/2=Suppressed)` <chr>,
# `Premature Death (AIAN)` <chr>, `Premature Death CI low (AIAN)` <chr>, …
Call:
lm(formula = low_birthweight_raw_value ~ adult_smoking_raw_value +
adult_obesity_raw_value + food_insecurity_raw_value + excessive_drinking_raw_value +
physical_inactivity_raw_value + insufficient_sleep_raw_value +
sexually_transmitted_infections_raw_value + uninsured_raw_value,
data = healthdata_subset)
Residuals:
Min 1Q Median 3Q Max
-0.089216 -0.007961 -0.000980 0.007360 0.068498
Coefficients:
Estimate Std. Error t value
(Intercept) 4.214e-02 5.667e-03 7.436
adult_smoking_raw_value -6.893e-02 1.362e-02 -5.062
adult_obesity_raw_value -3.109e-03 1.004e-02 -0.310
food_insecurity_raw_value 7.278e-02 1.487e-02 4.895
excessive_drinking_raw_value -1.738e-01 1.541e-02 -11.273
physical_inactivity_raw_value 4.167e-02 1.347e-02 3.093
insufficient_sleep_raw_value 1.645e-01 1.254e-02 13.124
sexually_transmitted_infections_raw_value 2.684e-05 1.171e-06 22.931
uninsured_raw_value -1.427e-02 7.117e-03 -2.005
Pr(>|t|)
(Intercept) 1.53e-13 ***
adult_smoking_raw_value 4.53e-07 ***
adult_obesity_raw_value 0.75692
food_insecurity_raw_value 1.06e-06 ***
excessive_drinking_raw_value < 2e-16 ***
physical_inactivity_raw_value 0.00201 **
insufficient_sleep_raw_value < 2e-16 ***
sexually_transmitted_infections_raw_value < 2e-16 ***
uninsured_raw_value 0.04509 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.01314 on 2006 degrees of freedom
Multiple R-squared: 0.5532, Adjusted R-squared: 0.5514
F-statistic: 310.5 on 8 and 2006 DF, p-value: < 2.2e-16
To determine the model with the highest accuracy, we used 10-fold cross validation to evaluate the performance of each model. Ultimately, the linear regression model was chosen as it had the lowest root mean square error.
Using the linear model, the coefficient estimates for the various predictors were obtained. The graph below demonstrates the strength and direction of each predictor on low birthweight. From this linear model, we can determine that insufficient sleep and excessive drinking are the strongest predictors of low birthweight. The discussion section of this report will delve into the reason as to why these might be the strongest predictors.
To investigate the relationship between the predictor variables and the response variable of child mortality, we used a decision tree. This method was used to best examine how our different variables interact with each other. The tree structure allowed us to clearly visualize how different variables influence the outcome, allowing for greater interpretability. The decision tree also automatically selected the most informative features to split on at each node. The root node or top most “box” on the tree is where the entire dataset starts dividing based on various features. The numbers in each box also represent critical information for understanding the decision tree. For example, the number at the top of each node corresponds to the average rate of child mortality while the percentage listed below is the percent of data that makes it to that node. Since the root node represents the entire dataset, the “64” means that the average rate of child mortality in the entire dataset is 64 (per 100,000) and the “100” below means it contains 100% of the data. We read this particular tree from top to bottom based on the inequality listed below each node.
The decision tree was then used to generate a variable importance plot, helping us pinpoint predictor variables which are most critical in determining child mortality. As shown below, this plot determined that smoking, obesity, and physical inactivity as the best predictors of child mortality.
Discussion
come back to
References
Appendix
(Feel free to remove this section when you submit)
This a Quarto document. To learn more about Quarto see https://quarto.org. You can use the Render button to see what it looks like in HTML.
Text formatting
Text can be bolded with double asterisks and italicized with single asterisks. Monospace text, such as for short code snippets, uses backticks. (Note these are different from quotation marks or apostrophes.) Links are written like this.
Bulleted lists can be written with asterisks:
- Each item starts on a new line with an asterisk.
- Items should start on the beginning of the line.
- Leave blank lines after the end of the list so the list does not continue.
Mathematics can be written with LaTeX syntax using dollar signs. For instance, using single dollar signs we can write inline math: (-b \pm \sqrt{b^2 - 4ac})/2a.
To write math in “display style”, i.e. displayed on its own line centered on the page, we use double dollar signs: x^2 + y^2 = 1
Code blocks
Code blocks are evaluated sequentially when you hit Render. As the code runs, R prints out which block is running, so naming blocks is useful if you want to know which one takes a long time. After the block name, you can specify chunk options. For example, echo controls whether the code is printed in the document. By default, output is printed in the document in monospace:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Chunk options can also be written inside the code block, which is helpful for really long options, as we’ll see soon.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Figures
If a code block produces a plot or figure, this figure will automatically be inserted inline in the report. That is, it will be inserted exactly where the code block is.
Notice the use of fig-width and fig-height to control the figure’s size (in inches). These control the sizes given to R when it generates the plot, so R proportionally adjusts the font sizes to be large enough.
Tables
Use the knitr::kable() function to print tables as HTML:
| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
We can summarize model results with a table. For instance, suppose we fit a linear regression model:
model1 <- lm(mpg ~ disp + hp + drat, data = mtcars)It is not appropriate to simply print summary(model1) into the report. If we want the reader to understand what models we have fit and what their results are, we should provide a nicely formatted table. A simple option is to use the tidy() function from the broom package to get a data frame of the model fit, and simply report that as a table.
| Term | Estimate | SE | t | p |
|---|---|---|---|---|
| (Intercept) | 19.34 | 6.37 | 3.04 | 0.01 |
| disp | -0.02 | 0.01 | -2.05 | 0.05 |
| hp | -0.03 | 0.01 | -2.34 | 0.03 |
| drat | 2.71 | 1.49 | 1.83 | 0.08 |